Google Mobility Data Visualization

Qiong Liu

02/25/2021

Google community mobility data is the open-source data reflecting the mobility changes across global communities in response to policies aimed at combating the COVID-19 crisis. This data set tracks daily movement trends into six categories, including retail and recreation, groceries and pharmacy, parks, transit stations, workplace, and residential. In this notebook, we demonstrate how to query google mobility data using the BigQuery API client python library, and visualize the mobility changes over time since February 2020.

Understanding Google mobility data

  • Google documentation on mobility data is here.
  • Mobility report does not give an absolute visitor number, but relative changes instead. These values were calculated by comparing to a baseline derived from dates before the pandemic started.
  • The baseline is the median value, for the corresponding day of the week, during the 5-week period Jan 3–Feb 6, 2020. We advise users to interpret the Google mobility trend cautiously since the baseline mobility patterns are impacted by other region-specific factors, such as weather and local events.
  • There are gaps and missing data for dates that do not meet the privacy threshold.

Setup Notebook

Using the Google BigQuery API Client library requires authentication set up. Please follow the Instruction and download the a JSON key to your local computer.

Uncomment the line(by removing the #) to install python packages if needed.

In [2]:
#!pip install numpy
#!pip install pandas
#!pip install geopandas
#!pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'
#!pip install plotly==4.14.3
#!pip install --upgrade google-cloud-bigquery
import pandas as pd
import geopandas as gpd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import json
import os
import math
from google.cloud import bigquery

# Setup the Google application credential
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="<PATH_to_Google_JSON_key>"
# Create Client object for query purpose
client = bigquery.Client()

The Google mobility data table

In [3]:
# Query for data description in google mobility data
table_sql="""
    SELECT *
    FROM `bigquery-public-data.covid19_google_mobility`.INFORMATION_SCHEMA.COLUMNS
    """

query_job = client.query(table_sql)
mobility_col_df = query_job.to_dataframe()
mobility_col_df[["column_name", "data_type"]]
Out[3]:
column_name data_type
0 country_region_code STRING
1 country_region STRING
2 sub_region_1 STRING
3 sub_region_2 STRING
4 metro_area STRING
5 iso_3166_2_code STRING
6 census_fips_code STRING
7 place_id STRING
8 date DATE
9 retail_and_recreation_percent_change_from_base... INT64
10 grocery_and_pharmacy_percent_change_from_baseline INT64
11 parks_percent_change_from_baseline INT64
12 transit_stations_percent_change_from_baseline INT64
13 workplaces_percent_change_from_baseline INT64
14 residential_percent_change_from_baseline INT64

Visualize mobility data across the United States

In [4]:
# Query for mobility data (State level) of the US
sql_us="""
  SELECT
  country_region,
  sub_region_1,
  iso_3166_2_code,
  date,
  retail_and_recreation_percent_change_from_baseline as recreation,
  grocery_and_pharmacy_percent_change_from_baseline as grocery,
  parks_percent_change_from_baseline as park,
  transit_stations_percent_change_from_baseline as transit,
  workplaces_percent_change_from_baseline as workplace,
  residential_percent_change_from_baseline as residential  
  FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
  WHERE 
    country_region_code = 'US' AND 
    sub_region_1 is not null AND
    iso_3166_2_code is not null
  order by sub_region_1, date
"""
us_query_job = client.query(sql_us)
us_mobility_df = us_query_job.to_dataframe()
us_mobility_df.describe()
Out[4]:
recreation grocery park transit workplace residential
count 19125.000000 19125.000000 18667.000000 18990.000000 19125.000000 19125.000000
mean -16.799948 -3.247529 41.857770 -22.866983 -28.031163 9.047163
std 16.366573 12.285504 75.930993 22.255382 15.727581 6.163869
min -92.000000 -79.000000 -77.000000 -89.000000 -87.000000 -5.000000
25% -26.000000 -10.000000 -11.000000 -39.000000 -38.000000 5.000000
50% -15.000000 -3.000000 19.000000 -24.000000 -29.000000 9.000000
75% -6.000000 3.000000 75.000000 -6.000000 -17.000000 13.000000
max 41.000000 61.000000 636.000000 73.000000 18.000000 36.000000

Visualize changes in retail and recreation across the US

In [5]:
# Change the format of "iso_3166_2_code" and "date"
us_mobility_df["iso_3166_2_code"]=us_mobility_df["iso_3166_2_code"].str.replace('US-', '', regex=True)
us_mobility_df = us_mobility_df.fillna(0).copy()
us_mobility_df['date'] = pd.to_datetime(us_mobility_df['date'])
us_mobility_df['date'] = us_mobility_df['date'].dt.strftime('%m/%d/%Y')

# Visualize the mobility changes at "Retail and Recreation" in the US over time
# To visualize the mobility changes in other categories, simply change the color="<category>"
us_fig =px.choropleth(us_mobility_df, 
              locations = 'iso_3166_2_code',
              color="recreation", 
              animation_frame="date",
              color_continuous_scale="Inferno",
              locationmode='USA-states',
              scope="usa",
              range_color=(-100, 20),
              labels={'recreation':'Retail and Recreation'}
             )

us_fig.update_layout(height=650,
                  margin={"r":0,"t":100,"l":0,"b":0},
                  title = "Google Mobility Trend in the US: Changes in Retail and Recreation Visits Since Feb 2020",
                  title_font_size=20)
us_fig.show()

Summary

  • The baseline was calculated between Jan and Feb 2020. As expected, the decline in retail and recreation mobility started around mid-March last year, when COVID-19 was declared as a National Emergency.
  • Drastic mobility decline (> 50%) in retail and recreation were observed throughout the country during holidays, such as Easter(April 12th), and Christmas(Dec 25).
  • Currently, a 20% decline in retail and recreation is still widely observed in the US.

Visualize mobility changes between States

In [6]:
# Compare moblity trend between States
# Four States were selected for demonstration purpose, including California, Florida, Illinois, and New York
us_df = us_mobility_df.drop(['country_region', 'sub_region_1'], axis=1)
us_df_melted = pd.melt(us_df, id_vars=["iso_3166_2_code", "date"] ,
                       value_vars=["recreation","grocery","park","transit","workplace", "residential"],
                       var_name="location", value_name='mobility')


us_df_subset = us_df_melted[us_df_melted["iso_3166_2_code"].isin(["CA", "FL", "IL", "NY"])]

fig = px.line(us_df_subset, x= "date", y= "mobility",color="location",
              facet_col="iso_3166_2_code", facet_col_wrap=1,
              line_shape='linear', render_mode="svg", hover_name="location",
              labels=dict(date="Date", mobility="Mobility (%)", location="Location"))

fig.update_layout(height=900, 
                  width=800, 
                  margin={"r":0,"t":100,"l":0,"b":0},
                  title = "Mobility Trend in Four States Since Feb 2020",
                  title_font_size=25)

fig.add_hline(y=0, line_dash="dot",
              annotation_text="Baseline",
              annotation_position="bottom right")

fig.update_yaxes(matches=None)
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig.update_xaxes(tickvals=["03/01/2020","05/01/2020","07/01/2020","09/01/2020","11/01/2020","01/01/2021"])
fig.add_vrect(x0="03/09/2020", x1="04/11/2020", col=1,
              annotation_text="Decline", annotation_position="top left",
              fillcolor="green", opacity=0.25, line_width=0)

fig.show()

Summary

  • A drastic decline in mobility was aobserved between mid-March and early April in all categories, except "Residential", across all four states;
  • Among these declined categories, the least reduction was observed in "Groceries and Pharmacy" since the pandemic of COVID19;
  • A significant increase in "Park" mobility was observed during summer in both the states of Illinois and New York by comparing to their baseline calculated between Jan and Feb of 2020. However, the absence of mobility increase in "Park" in some states does not imply less mobility compared to other states, such as Illinois, because the absolute number of visits is not shared in Google mobility data table.

Visualize mobility data of counties in IL

To visualize the mobility data in county data on the map, we would need the county shape files, such as IL_BNDY_County_Py.shp. These files can be downloaded with this link. Download the file and decompress the zip file before you run the cell below.

In [8]:
# Add path of illinois shape file(IL_BNDY_County_py.shp) to the line below. 
# This file can be found inside decompressed zip folder (IL_BNDY_County)
shape_f = "<PATH_to_IL_BNDY_County_py.shp>"
map_df = gpd.read_file(shape_f)
# Save the GeoJson file at local
map_df.to_file("IL_States_gpd.json", driver='GeoJSON')
# Load GeoJson file
with open("IL_States_gpd.json") as f:
  illinois_state = json.load(f)

# Modify the fips id to align with the fips id in Google mobility table
def modify_fips(number):
    newnum = '{:03d}'.format(number)
    newnum = '17' + str(newnum)
    return newnum

for i in range(len(illinois_state["features"])):
    subdict = illinois_state["features"][i]
    newfips = modify_fips(subdict["properties"]["CO_FIPS"])
    illinois_state["features"][i]["properties"]["CO_FIPS"] = newfips


# Query county level monthly mobility data from Google Mobility data table
il_month_sql = """
  SELECT
  country_region_code,
  sub_region_1,
  sub_region_2,
  census_fips_code,
  EXTRACT (YEAR FROM date) as year,
  EXTRACT (MONTH FROM date) as month,
  ROUND(AVG(retail_and_recreation_percent_change_from_baseline),2) as recreation,
  ROUND(AVG(grocery_and_pharmacy_percent_change_from_baseline),2) as grocery,
  ROUND(AVG(parks_percent_change_from_baseline),2) as park,
  ROUND(AVG(transit_stations_percent_change_from_baseline),2) as transit,
  ROUND(AVG(workplaces_percent_change_from_baseline),2) as workplace,
  ROUND(AVG(residential_percent_change_from_baseline),2) as residential
  FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
  WHERE 
    country_region_code = 'US' AND 
    sub_region_1 = "Illinois" AND
    census_fips_code is not null
  GROUP BY country_region_code, sub_region_1, sub_region_2, census_fips_code, year, month
  order by sub_region_1, sub_region_2, year, month
"""
query_job = client.query(il_month_sql)
il_month_df = query_job.to_dataframe()
il_month_df.fillna(0, inplace=True)
il_month_df.head()
Out[8]:
country_region_code sub_region_1 sub_region_2 census_fips_code year month recreation grocery park transit workplace residential
0 US Illinois Adams County 17001 2020 2 7.27 1.80 0.0 -3.33 3.07 0.15
1 US Illinois Adams County 17001 2020 3 -14.23 1.90 0.0 -17.17 -11.19 5.62
2 US Illinois Adams County 17001 2020 4 -36.70 -7.67 0.0 -6.85 -32.37 13.95
3 US Illinois Adams County 17001 2020 5 -23.94 -0.19 0.0 0.00 -26.71 11.19
4 US Illinois Adams County 17001 2020 6 -8.63 3.17 0.0 0.00 -20.37 6.35
In [9]:
# Create a new date format
il_month_df['date'] = il_month_df["year"].astype(str) + "-" + il_month_df["month"].astype(str)

# Visualize the residential mobility change in all counties of IL since Feb 2020
il_residential_df = il_month_df.drop(['recreation', 'grocery', 'park', 'transit', 'workplace'], axis=1)
il_residential_df['date'] = pd.to_datetime(il_residential_df['date'])
il_residential_df['date'] = il_residential_df['date'].dt.strftime('%m/%Y')

fig = px.choropleth_mapbox(il_residential_df, geojson=illinois_state,
                      locations='census_fips_code', 
                      color='residential',
                      color_continuous_scale="redor",
                      range_color=(-1, 23),
                      featureidkey="properties.CO_FIPS",
                      mapbox_style="carto-positron",
                      opacity=0.6,
                      center = {"lat": 40, "lon": -89.3985}, 
                      zoom=5.7,
                      hover_name = "sub_region_2",
                      animation_frame='date')
fig.update_geos(fitbounds="locations",visible=False)
fig.update_layout(height=800, 
                  width=750, 
                  margin={"r":0,"t":50,"l":0,"b":0},
                  title = "Residential Mobility Trend in IL Since Feb 2020",
                  title_font_size=30)
fig.show()

Summary

  • Significant increase in residential mobility was observed during March and April, especially in the Metropolitan area, in response to a stay-at-home order issued by Governor J. B. Pritzker back in March 2020.
  • Missing residential data were observed in several counties of IL, such as Greene County, due to privacy reasons.

Visualize mobility and COVID-19 data of Cook County, IL

Here, we demonstrate how to query information from two tables and join the information through BigQuery

In [10]:
cook_county_sql = """
SELECT
  date,
  confirmed,
  sub_region_1,
  sub_region_2,
  fips,
  Recreation,
  Park,
  Transit,
  Grocery,
  Workplace,
  Residential 
FROM (
  SELECT
    date,
    confirmed,
    fips
  FROM `bigquery-public-data.covid19_jhu_csse.summary`
  WHERE
    fips = "17031") a
FULL JOIN (
  SELECT
    sub_region_1,
    sub_region_2,
    census_fips_code as fips,
    date,
    retail_and_recreation_percent_change_from_baseline as Recreation,
    grocery_and_pharmacy_percent_change_from_baseline as Grocery,
    parks_percent_change_from_baseline as Park,
    transit_stations_percent_change_from_baseline as Transit,
    workplaces_percent_change_from_baseline as Workplace,
    residential_percent_change_from_baseline as Residential 
  FROM `bigquery-public-data.covid19_google_mobility.mobility_report`
  WHERE census_fips_code = "17031") b
USING (date, fips)
ORDER By date
"""
query_job = client.query(cook_county_sql)
cook_df = query_job.to_dataframe()
cook_df["sub_region_1"]=["Illinois"]*(cook_df.shape[0])
cook_df["sub_region_2"]=["Cook County"]*(cook_df.shape[0])
cook_df.head()
Out[10]:
date confirmed sub_region_1 sub_region_2 fips Recreation Park Transit Grocery Workplace Residential
0 2020-02-15 NaN Illinois Cook County 17031 5.0 17.0 3.0 1.0 2.0 -1.0
1 2020-02-16 NaN Illinois Cook County 17031 9.0 40.0 2.0 2.0 1.0 -2.0
2 2020-02-17 NaN Illinois Cook County 17031 2.0 -18.0 -15.0 -4.0 -24.0 6.0
3 2020-02-18 NaN Illinois Cook County 17031 1.0 6.0 2.0 0.0 1.0 0.0
4 2020-02-19 NaN Illinois Cook County 17031 1.0 1.0 1.0 -2.0 1.0 0.0
In [11]:
# Calculate daily new cases in cook county using "confirmed" column
cook_confirm = cook_df["confirmed"]
new_case = [None] * len(cook_confirm)
for i in range(len(cook_confirm)):
    if math.isnan(cook_confirm[i]):
        pass
    elif math.isnan(cook_confirm[i-1]):
        pass
    else:
        new_case[i] = cook_confirm[i]-cook_confirm[i-1]
cook_df["new_cases"] = new_case

# Prepare dataframes for ploting
cook_case_df = cook_df[["date", "confirmed", "new_cases"]]
cook_mobility_df = cook_df[["date", "Recreation", "Transit", "Park", "Grocery", "Workplace", "Residential"]]

# Plot both dataframes in one figure
colors=["rgb(166,206,227)", "rgb(31,120,180)", "rgb(178,223,138)",
        "rgb(51,160,44)", "rgb(251,154,153)", "rgb(227,26,28)"]

fig = make_subplots(rows=2, cols=1,
                   subplot_titles=("Mobility Trend in Cook County, IL since Feb, 2020", "Daily Confirmed New Cases in Cook County, IL since Feb, 2020 "),
                   row_heights=[0.7, 0.3])
fig.add_trace(
    go.Scatter(x=cook_mobility_df["date"], 
               y=cook_mobility_df["Residential"],
               mode = 'lines',
               line_shape='spline',
               name="Residential",
               line=dict(width=1.5, color=colors[0])),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=cook_mobility_df["date"], 
               y=cook_mobility_df["Workplace"],
               mode = 'lines',
               line_shape='spline',
               name="Workplace",
               line=dict(width=1.5, color=colors[1])),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=cook_mobility_df["date"], 
               y=cook_mobility_df["Park"],
               mode = 'lines',
               name="Park",
               line_shape='spline',
               line=dict(width=1.5, color=colors[3])),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=cook_mobility_df["date"], 
               y=cook_mobility_df["Recreation"],
               mode = 'lines',
               name="Recreation",
               line_shape='spline',
               line=dict(width=1.5, color=colors[2])),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=cook_mobility_df["date"], 
               y=cook_mobility_df["Grocery"],
               mode = 'lines',
               name="Grocery",
               line_shape='spline',
               line=dict(width=1.5, color=colors[4])),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=cook_mobility_df["date"], 
               y=cook_mobility_df["Transit"],
               mode = 'lines',
               name="Transit",
               line_shape='spline',
               line=dict(width=1.5, color=colors[5])),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=cook_case_df["date"], y=cook_case_df["new_cases"],
               mode = 'lines', line_shape='spline', name="New Confirmed Cases",
               line=dict(width=2)),
    row=2, col=1
)
fig.update_layout(plot_bgcolor='rgba(0,0,0,0)', margin={"r":0,"t":100,"l":0,"b":0})
fig.update_xaxes(title_text="Date")
fig.show()

Summary

  • A significant increase in park visits was observed in Cook County between June and October, 2020
  • Mobility in "Transit" and "Workplace" has decreased around 50% since the pandemic started.
  • Large numbers of daily confirmed cases of COVID-19 were observed between October 2020 and Jan 2020 in cook county, compared to the spring of 2020. However, the lower number of confirmed cases during the spring of 2020 is likely caused by limited testing capacity at the begining of the pandemic.